Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Self-reported biographical strings on social media profiles provide a powerful tool to study self-identity. We present HINENI, a dataset of 420 million Twitter user profiles collected over a 12 year period, partitioned into 32 distinct national cohorts, which we believe is the largest publicly available data resource for identity research. We report on the major design decisions underlying HINENI, including a new notion of sampling (k-persistence) which spans the divide between traditional cross-sectional and longitudinal approaches. We demonstrate the power of HINENI to study the relative survival rate (half-life) of different tokens, and the use of emoji analysis across national cohorts to study the effects of gender, national, and sports identities.more » « less
-
Trust is predictive of civic cooperation and economic growth. Recently, the U.S. public has demonstrated increased partisan division and a surveyed decline in trust in institutions. There is a need to quantify individual and community levels of trust unobtrusively and at scale. Using observations of language across more than 16,000 Facebook users, along with their self-reported generalized trust score, we develop and evaluate a language-based assessment of generalized trust. We then apply the assessment to more than 1.6 billion geotagged tweets collected between 2009 and 2015 and derive estimates of trust across 2,041 U.S. counties. We find generalized trust was associated with more affiliative words (love, we, andfriends) and less angry words (hateandstupid) but only had a weak association with social words primarily driven by strong negative associations with general othering terms (“they” and “people”). At the county level, associations with the Centers for Disease Control and Prevention (CDC) and Gallup surveys suggest that people in high-trust counties were physically healthier and more satisfied with their community and their lives. Our study demonstrates that generalized trust levels can be estimated from language as a low-cost, unobtrusive method to monitor variations in trust in large populations.more » « less
-
Cotfas, Liviu-Adrian (Ed.)Personally expressed identity is who or what an individual themselves says they are, and it should be studied at scale. At scale means with data on millions of individuals, which is newly available and comes timestamped and geocoded. This work introduces a dataset for the study of identity at scale and describes the method for collecting and aggregating such data. Further, tools and theory for working with the data are presented. A demonstration analysis provides evidence that personal, individual development and changing cultural norms can be observed with these data and methods.more » « less
An official website of the United States government
